智能论文笔记

AugmentedPCA: A Python Package of Supervised and Adversarial Linear Factor Models

William E. Carson IV , Austin Talbot , David Carlson

分类： (统计)机器学习 | 机器学习

2022-01-07

深度自身偏移通常具有监督或对抗的损失，以学习具有所需性质的潜在表示，例如对敏感变量的标签和结果或公平的更大预测性。尽管受到监督和对抗性深度潜在因子模型的难以致力于，但这些方法应该表现出更简单的线性方法在实践中优选的改进。这需要可重复的线性模拟，仍然遵守增强监督或对抗目标。我们通过提出使用监督或对冲目标的主成分分析（PCA）目标的方法来解决该方法论差距，并提供分析和可重复的解决方案。我们在开源Python软件包中实现这些方法，AugmentedPCA，可以生产出色的真实基础。我们证明了这些因子模型在开源的RNA-SEQ癌症基因表达数据集上的效用，表明增强具有监督目标，提高下游分类性能，产生具有更大级别保真度的主要成分，并有助于鉴定对齐的基因利用具有对特定类型癌症的发展的主要数据差异轴。

translated by 谷歌翻译

Estimating Potential Outcome Distributions with Collaborating Causal Networks

Tianhui Zhou , William E Carson IV , David Carlson

分类： (统计)机器学习 | 机器学习

2021-10-04

传统的因果推理方法利用观察性研究数据来估计潜在治疗的观察到的差异和未观察到的结果，称为条件平均治疗效果（CATE）。然而，凯特就对应于仅第一刻的比较，因此可能不足以反映治疗效果的全部情况。作为替代方案，估计全部潜在结果分布可以提供更多的见解。但是，估计治疗效果的现有方法潜在的结果分布通常对这些分布施加限制性或简单的假设。在这里，我们提出了合作因果网络（CCN），这是一种新颖的方法，它通过学习全部潜在结果分布而超出了CATE的估计。通过CCN框架估算结果分布不需要对基础数据生成过程的限制性假设。此外，CCN促进了每种可能处理的效用的估计，并允许通过效用函数进行特定的特定变异。 CCN不仅将结果估计扩展到传统的风险差异之外，而且还可以通过定义灵活的比较来实现更全面的决策过程。根据因果文献中通常做出的假设，我们表明CCN学习了渐近捕获真正潜在结果分布的分布。此外，我们提出了一种调整方法，该方法在经验上可以有效地减轻观察数据中治疗组之间的样本失衡。最后，我们评估了CCN在多个合成和半合成实验中的性能。我们证明，与现有的贝叶斯和深层生成方法相比，CCN学会了改进的分布估计值，以及对各种效用功能的改进决策。

translated by 谷歌翻译

Supervising the Decoder of Variational Autoencoders to Improve Scientific Utility

Liyun Tu , Austin Talbot , Neil Gallagher , David Carlson

分类： (统计)机器学习 | 机器学习

2021-09-09

概率生成模型对科学建模具有吸引力，因为它们的推论参数可用于生成假设和设计实验。这要求学习的模型提供了对输入数据的准确表示，并产生一个潜在空间，该空间有效地预测了与科学问题相关的结果。监督的变异自动编码器（SVAE）以前已用于此目的，在此目的中，精心设计的解码器可以用作可解释的生成模型，而监督目标可确保预测性潜在表示。不幸的是，监督的目标迫使编码器学习与生成后验分布有偏见的近似，这在科学模型中使用时使生成参数不可靠。由于通常用于评估模型性能的重建损失，因此该问题仍未被发现。我们通过开发一个二阶监督框架（SOS-VAE）来解决这个以前未报告的问题，该框架影响解码器诱导预测潜在的代表。这样可以确保关联的编码器保持可靠的生成解释。我们扩展了此技术，以使用户能够在生成参数中折叠以提高预测性能，并充当SVAE和我们的新SOS-VAE之间的中间选择。我们还使用这种方法来解决在组合来自多个科学实验的录音时经常出现的缺失数据问题。我们使用合成数据和电生理记录来证明这些发展的有效性，重点是如何使用我们学到的表示形式来设计科学实验。

translated by 谷歌翻译

Estimating Uncertainty Intervals from Collaborating Networks

Tianhui Zhou , Yitong Li , Yuan Wu , David Carlson

分类： (统计)机器学习 | 机器学习

2020-02-12

有效的决策需要了解预测中固有的不确定性。在回归中，这种不确定性可以通过各种方法估算;然而，许多这些方法对调谐进行费力，产生过度自确性的不确定性间隔，或缺乏敏锐度（给予不精确的间隔）。我们通过提出一种通过定义具有两个不同损失功能的神经网络来捕获回归中的预测分布的新方法来解决这些挑战。具体地，一个网络近似于累积分布函数，第二网络近似于其逆。我们将此方法称为合作网络（CN）。理论分析表明，优化的固定点处于理想化的解决方案，并且该方法是渐近的与地面真理分布一致。凭经验，学习是简单且强大的。我们基准CN对两个合成和六个现实世界数据集的几种常见方法，包括预测来自电子健康记录的糖尿病患者的A1C值，其中不确定是至关重要的。在合成数据中，所提出的方法与基本上匹配地面真理。在真实世界数据集中，CN提高了许多性能度量的结果，包括对数似然估计，平均误差，覆盖估计和预测间隔宽度。

translated by 谷歌翻译

Invalidator: Automated Patch Correctness Assessment via Semantic and Syntactic Reasoning

Thanh Le-Cong , Duc-Minh Luong , Xuan Bach D. Le , David Lo , Nhat-Hoa Tran , Bui Quang-Huy , Quyet-Thang Huynh

分类：机器学习

2023-01-03

In this paper, we propose a novel technique, namely INVALIDATOR, to automatically assess the correctness of APR-generated patches via semantic and syntactic reasoning. INVALIDATOR reasons about program semantic via program invariants while it also captures program syntax via language semantic learned from large code corpus using the pre-trained language model. Given a buggy program and the developer-patched program, INVALIDATOR infers likely invariants on both programs. Then, INVALIDATOR determines that a APR-generated patch overfits if: (1) it violates correct specifications or (2) maintains errors behaviors of the original buggy program. In case our approach fails to determine an overfitting patch based on invariants, INVALIDATOR utilizes a trained model from labeled patches to assess patch correctness based on program syntax. The benefit of INVALIDATOR is three-fold. First, INVALIDATOR is able to leverage both semantic and syntactic reasoning to enhance its discriminant capability. Second, INVALIDATOR does not require new test cases to be generated but instead only relies on the current test suite and uses invariant inference to generalize the behaviors of a program. Third, INVALIDATOR is fully automated. We have conducted our experiments on a dataset of 885 patches generated on real-world programs in Defects4J. Experiment results show that INVALIDATOR correctly classified 79% overfitting patches, accounting for 23% more overfitting patches being detected by the best baseline. INVALIDATOR also substantially outperforms the best baselines by 14% and 19% in terms of Accuracy and F-Measure, respectively.

translated by 谷歌翻译

Conservation Tools: The Next Generation of Engineering--Biology Collaborations

Andrew Schulz , Cassie Shriver , Suzanne Stathatos , Benjamin Seleb , Emily Weigel , Young-Hui Chang , M. Saad Bhamla , David Hu , Joseph R. Mendelson III , .

分类：机器学习

2023-01-03

The recent increase in public and academic interest in preserving biodiversity has led to the growth of the field of conservation technology. This field involves designing and constructing tools that utilize technology to aid in the conservation of wildlife. In this article, we will use case studies to demonstrate the importance of designing conservation tools with human-wildlife interaction in mind and provide a framework for creating successful tools. These case studies include a range of complexities, from simple cat collars to machine learning and game theory methodologies. Our goal is to introduce and inform current and future researchers in the field of conservation technology and provide references for educating the next generation of conservation technologists. Conservation technology not only has the potential to benefit biodiversity but also has broader impacts on fields such as sustainability and environmental protection. By using innovative technologies to address conservation challenges, we can find more effective and efficient solutions to protect and preserve our planet's resources.

translated by 谷歌翻译

Posterior Collapse and Latent Variable Non-identifiability

Yixin Wang , David M. Blei , John P. Cunningham

分类： (统计)机器学习 | 机器学习

2023-01-02

Variational autoencoders model high-dimensional data by positing low-dimensional latent variables that are mapped through a flexible distribution parametrized by a neural network. Unfortunately, variational autoencoders often suffer from posterior collapse: the posterior of the latent variables is equal to its prior, rendering the variational autoencoder useless as a means to produce meaningful representations. Existing approaches to posterior collapse often attribute it to the use of neural networks or optimization issues due to variational approximation. In this paper, we consider posterior collapse as a problem of latent variable non-identifiability. We prove that the posterior collapses if and only if the latent variables are non-identifiable in the generative model. This fact implies that posterior collapse is not a phenomenon specific to the use of flexible distributions or approximate inference. Rather, it can occur in classical probabilistic models even with exact inference, which we also demonstrate. Based on these results, we propose a class of latent-identifiable variational autoencoders, deep generative models which enforce identifiability without sacrificing flexibility. This model class resolves the problem of latent variable non-identifiability by leveraging bijective Brenier maps and parameterizing them with input convex neural networks, without special variational inference objectives or optimization tricks. Across synthetic and real datasets, latent-identifiable variational autoencoders outperform existing methods in mitigating posterior collapse and providing meaningful representations of the data.

translated by 谷歌翻译

Mapping smallholder cashew plantations to inform sustainable tree crop expansion in Benin

Leikun Yin , Rahul Ghosh , Chenxi Lin , David Hale , Christoph Weigl , James Obarowski , Junxiong Zhou , Jessica Till , Xiaowei Jia , Troy Mao

分类：计算机视觉 | 机器学习

2023-01-01

Cashews are grown by over 3 million smallholders in more than 40 countries worldwide as a principal source of income. As the third largest cashew producer in Africa, Benin has nearly 200,000 smallholder cashew growers contributing 15% of the country's national export earnings. However, a lack of information on where and how cashew trees grow across the country hinders decision-making that could support increased cashew production and poverty alleviation. By leveraging 2.4-m Planet Basemaps and 0.5-m aerial imagery, newly developed deep learning algorithms, and large-scale ground truth datasets, we successfully produced the first national map of cashew in Benin and characterized the expansion of cashew plantations between 2015 and 2021. In particular, we developed a SpatioTemporal Classification with Attention (STCA) model to map the distribution of cashew plantations, which can fully capture texture information from discriminative time steps during a growing season. We further developed a Clustering Augmented Self-supervised Temporal Classification (CASTC) model to distinguish high-density versus low-density cashew plantations by automatic feature extraction and optimized clustering. Results show that the STCA model has an overall accuracy of 80% and the CASTC model achieved an overall accuracy of 77.9%. We found that the cashew area in Benin has doubled from 2015 to 2021 with 60% of new plantation development coming from cropland or fallow land, while encroachment of cashew plantations into protected areas has increased by 70%. Only half of cashew plantations were high-density in 2021, suggesting high potential for intensification. Our study illustrates the power of combining high-resolution remote sensing imagery and state-of-the-art deep learning algorithms to better understand tree crops in the heterogeneous smallholder landscape.

translated by 谷歌翻译

Morphology-based non-rigid registration of coronary computed tomography and intravascular images through virtual catheter path optimization

Karim Kadry , Abhishek Karmakar , Andreas Schuh , Kersten Peterson , Michiel Schaap , David Marlevi , Charles Taylor , Elazer Edelman , Farhad Nezami

分类：计算机视觉

2022-12-30

Coronary Computed Tomography Angiography (CCTA) provides information on the presence, extent, and severity of obstructive coronary artery disease. Large-scale clinical studies analyzing CCTA-derived metrics typically require ground-truth validation in the form of high-fidelity 3D intravascular imaging. However, manual rigid alignment of intravascular images to corresponding CCTA images is both time consuming and user-dependent. Moreover, intravascular modalities suffer from several non-rigid motion-induced distortions arising from distortions in the imaging catheter path. To address these issues, we here present a semi-automatic segmentation-based framework for both rigid and non-rigid matching of intravascular images to CCTA images. We formulate the problem in terms of finding the optimal \emph{virtual catheter path} that samples the CCTA data to recapitulate the coronary artery morphology found in the intravascular image. We validate our co-registration framework on a cohort of $n=40$ patients using bifurcation landmarks as ground truth for longitudinal and rotational registration. Our results indicate that our non-rigid registration significantly outperforms other co-registration approaches for luminal bifurcation alignment in both longitudinal (mean mismatch: 3.3 frames) and rotational directions (mean mismatch: 28.6 degrees). By providing a differentiable framework for automatic multi-modal intravascular data fusion, our developed co-registration modules significantly reduces the manual effort required to conduct large-scale multi-modal clinical studies while also providing a solid foundation for the development of machine learning-based co-registration approaches.

translated by 谷歌翻译

Controllable Mechanical-domain Energy Accumulators

Sung Y. Kim , David J. Braun

分类：机器人

2022-12-29

Springs are efficient in storing and returning elastic potential energy but are unable to hold the energy they store in the absence of an external load. Lockable springs use clutches to hold elastic potential energy in the absence of an external load but have not yet been widely adopted in applications, partly because clutches introduce design complexity, reduce energy efficiency, and typically do not afford high-fidelity control over the energy stored by the spring. Here, we present the design of a novel lockable compression spring that uses a small capstan clutch to passively lock a mechanical spring. The capstan clutch can lock up to 1000 N force at any arbitrary deflection, unlock the spring in less than 10 ms with a control force less than 1 % of the maximal spring force, and provide an 80 % energy storage and return efficiency (comparable to a highly efficient electric motor operated at constant nominal speed). By retaining the form factor of a regular spring while providing high-fidelity locking capability even under large spring forces, the proposed design could facilitate the development of energy-efficient spring-based actuators and robots.

translated by 谷歌翻译